首页> 外文OA文献 >ProtNN: Fast and Accurate Nearest Neighbor Protein Function Prediction based on Graph Embedding in Structural and Topological Space
【2h】

ProtNN: Fast and Accurate Nearest Neighbor Protein Function Prediction based on Graph Embedding in Structural and Topological Space

机译:protNN:快速准确的最近邻蛋白功能预测   基于图形嵌入的结构和拓扑空间

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Studying the function of proteins is important for understanding themolecular mechanisms of life. The number of publicly available proteinstructures has increasingly become extremely large. Still, the determination ofthe function of a protein structure remains a difficult, costly, and timeconsuming task. The difficulties are often due to the essential role of spatialand topological structures in the determination of protein functions in livingcells. In this paper, we propose ProtNN, a novel approach for protein functionprediction. Given an unannotated protein structure and a set of annotatedproteins, ProtNN finds the nearest neighbor annotated structures based onprotein-graph pairwise similarities. Given a query protein, ProtNN finds thenearest neighbor reference proteins based on a graph representation model and apairwise similarity between vector embedding of both query and referenceprotein-graphs in structural and topological spaces. ProtNN assigns to thequery protein the function with the highest number of votes across the set of knearest neighbor reference proteins, where k is a user-defined parameter.Experimental evaluation demonstrates that ProtNN is able to accurately classifyseveral datasets in an extremely fast runtime compared to state-of-the-artapproaches. We further show that ProtNN is able to scale up to a whole PDBdataset in a single-process mode with no parallelization, with a gain ofthousands order of magnitude of runtime compared to state-of-the-artapproaches.
机译:研究蛋白质的功能对于理解生命的分子机制很重要。可公开获得的蛋白质结构的数量变得越来越大。仍然,蛋白质结构功能的确定仍然是困难,昂贵和耗时的任务。困难通常归因于空间和拓扑结构在确定活细胞中蛋白质功能中的重要作用。在本文中,我们提出了ProtNN,一种蛋白质功能预测的新方法。给定一个未注释的蛋白质结构和一组注释的蛋白质,ProtNN根据蛋白质图成对相似性找到最邻近的注释结构。给定查询蛋白,ProtNN根据图表示模型以及结构和拓扑空间中查询和参考蛋白图的矢量嵌入之间的配对相似性,找到最邻近的参考蛋白。 ProtNN为查询蛋白分配了在一组最接近的相邻参考蛋白中具有最高投票数的功能,其中k是用户定义的参数。实验评估表明,与状态相比,ProtNN能够以极快的运行时间准确地对多个数据集进行分类最先进的方法。我们进一步证明,ProtNN能够在没有并行化的单进程模式下扩展到整个PDB数据集,与最新技术相比,其运行时间增加了数千个数量级。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号